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We consider the problem of detecting communities or modules in networks, groups of vertices with 
a higher-than-average density of edges connecting them. Previous work indicates that a robust 
approach to this problem is the maximization of the benefit function known as "modularity" over 
possible divisions of a network. Here we show that this maximization process can be written in 
terms of the eigenspectrum of a matrix we call the modularity matrix, which plays a role in com- 
munity detection similar to that played by the graph Laplacian in graph partitioning calculations. 
This result leads us to a number of possible algorithms for detecting community structure, as 
well as several other results, including a spectral measure of bipartite structure in networks and 
a new centrality measure that identifies those vertices that occupy central positions within the 
communities to which they belong. The algorithms and measures proposed are illustrated with 
applications to a variety of real-world complex networks. 
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I. INTRODUCTION 

Networks have attracted considerable recent attention 
in physics and other helds as a foundation for the math- 
ematical representation of a variety of complex systems, 
including biological and social systems, the Internet, the 
worldwide web, and many others 0, 0, HI H • A common 
feature of many networks is "community structure," the 
tendency for vertices to divide into groups, with dense 
connections within groups and only sparser connections 
between them JM Q ■ Social networks , biochemical 
networks 0, H, U , and information networks such as the 
web [icj |. have all been shown to possess strong commu- 
nity structure, a finding that has substantial practical 
implications for our understanding of the systems these 
networks represent. Communities are of interest because 
they often correspond to functional units such as cycles 
or pathways in metabolic networks 0, IE E3 or collec- 
tions of pages on a single topic on the web 10], but their 
influence reaches further than this. A number of recent 
results suggest that networks can have properties at the 
community level that are quite different from their prop- 
erties at the level of the entire network, so that analy- 
ses that focus on whole networks and ignore community 
structure may miss many interesting features. 

For instance, in some social networks one finds individ- 
uals with different mean numbers of contacts in different 
groups; the individuals in one group might be gregarious, 
having many contacts with others, while the individuals 
in another group might be more reticent. An example 
of this behavior is seen in networks of sexual contacts, 
where separate communities of high- and low-activity in- 
dividuals have been observed 0,LU|. If one were to char- 
acterize such a network by quoting only a single figure 
for the average number of contacts an individual has, one 
would be missing features of the network directly relevant 
to questions of scientific interest such as epidemiological 
dynamics Q. 

It has also been shown that vertices' positions within 
communities can affect the role or function they assume. 



In social networks, for example, it has long been accepted 
that individuals who lie on the boundaries of communi- 
ties, bridging gaps between otherwise unconnected peo- 
ple, enjoy an unusual level of influence as the gatekeepers 
of information flow between groups (THl ITfiL Il7j . A sur- 
prisingly similar result is found in metabolic networks, 
where metabolites that straddle the boundaries between 
modules show particular persistence across species @. 
This finding might indicate that modules in metabolic 
networks possess some degree of functional independence 
within the cell, allowing vertices central to a module to 
change or disappear with relatively little effect on the rest 
of the network, while vertices on the borders of modules 
are less able to change without affecting other aspects of 
the cellular machinery. 

One can also consider the communities in a network 
themselves to form a higher level meta-network, a coarse- 
grained representation of the full network. Such coarse- 
grained representations have been used in the past as 
tools for visualization and analysis |l8j but more recently 
have also been investigated as topologically interesting 
entities in their own right. In particular, networks of 
modules appear to have degree distributions with inter- 
esting similarities to but also some differences from the 
degree distributions of other networks 9] , and may also 
display so-called preferential attachment in their forma- 
tion |l9j , indicating the possibility of distinct dynamical 
processes taking place at the level of the modules. 

For all of these reasons and others besides there has 
been a concerted effort in recent years within the physics 
community and elsewhere to develop mathematical tools 
and computer algorithms to detect and quantify com- 
munity structure in networks. A huge variety of com- 
munity detection techniques have been developed, based 
variously on centrality measures, flow models, random 
walks, resistor networks, optimization, and man y o ther 
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MM, HI IH HI HHI3. For reviews see Refs. HHg. 

In this paper we focus on one approach to commu- 
nity detection that has proven particularly effective, the 
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optimization of the benefit function known as "modu- 
larity" over the possible divisions of a network. Meth- 
ods based on this approach have been found to produce 
excellent results in standardized tests j3|| [37]]. Unfor- 
tunately, exhaustive optimization of the modularity de- 
mands an impractically large computational effort, but 
good results have been obtained with various approx- 
imate optimization techniques, inclu ding greedy algo- 
rithms 0, simulated annealing [3 EH- an d ex- 
tremal optimization |40|. In this paper we describe a 
different approach, in which we rewrite the modularity 
function in matrix terms, which allows us to express the 
optimization task as a spectral problem in linear alge- 
bra. This approach leads to a family of fast new com- 
puter algorithms for community detection that produce 
results competitive with the best previous methods. Per- 
haps more importantly, our work also leads to a number 
of useful insights about network structure via the close 
relations we will demonstrate between communities and 
matrix spectra. 

Our work is by no means the first to find connec- 
tions between divisions of networks and matrix spectra. 
There is a large literature within computer science on so- 
called spectral partitioning, in which network properties 
are linked to the spectrum of the graph Laplacian ma- 
trix |4ll l42l |43| . This method is different from the one 
introduced here and is not in general well suited to the 
problem of community structure detection. The reasons 
for this, however, turn out to be interesting and instruc- 
tive, so we begin our presentation with a brief review of 
the traditional spectral partitioning method in SectionlTll 
A consideration of the deficiencies of this method in Sec- 
tion [ffi] leads us in Sections IIVIIVII to introduce and de- 
velop at length our own method, which is based on the 
characteristic matrix we call the "modularity matrix." 
Sections IVIII and IVIIII explore some further ideas arising 
from the study of the modularity matrix but not directly 
related to community detection. In Section IIXI we give 
our conclusions. A brief report of some of the results de- 
scribed in this paper has appeared previously as Ref. |32j ] . 



II. GRAPH PARTITIONING AND THE LAPLACIAN 
MATRIX 

There is a long tradition of research in computer sci- 
ence on graph partitioning, a problem that arises in a 
variety of contexts, but most prominently in the devel- 
opment of computer algorithms for parallel or distributed 
computation. Suppose a computation requires the per- 
formance of some number n of tasks, each to be carried 
out by a separate process, program, or thread running on 
one of c different computer processors. Typically there 
is a desired number of tasks or volume of work to be as- 
signed to each of the processors. If the processors are 
identical, for instance, and the tasks are of similar com- 
plexity, we may wish to assign the same number of tasks 
to each processor so as to share the workload roughly 



equally. It is also typically the case that the individual 
tasks require for their completion results generated dur- 
ing the performance of other tasks, so tasks must com- 
municate with one another to complete the overall com- 
putation. The pattern of required communications can 
be thought of as a network with n vertices representing 
the tasks and an edge joining any pair of tasks that need 
to communicate, for a total of m edges. (In theory the 
amount of communication between different pairs of tasks 
could vary, leading to a weighted network, but we here 
restrict our attention to the simplest unweighted case, 
which already presents interesting challenges.) 

Normally, communications between processors in par- 
allel computers are slow compared to data movement 
within processors, and hence we would like to keep such 
communications to a minimum. In network terms this 
means we would like to divide the vertices of our net- 
work (the processes) into groups (the processors) such 
that the number of edges between groups is minimized. 
This is the graph partitioning problem. 

Problems of this type can be solved exactly in polyno- 
mial time [4^| , but unfortunately the polynomial in ques- 
tion is of leading order n c , which is already prohibitive 
for all but the smallest networks even when c takes the 
smallest possible value of 2. For practical applications, 
therefore, a number of approximate solution methods 
have been developed that appear to give reasonably good 
results. One of the most widely used is the spectral par- 
titioning method, due originally to Fiedler |4lj ] and pop- 
ularized particularly by Pothen et al. |42( . We here con- 
sider the simplest instance of the method, where c = 2, 
i.e., where our network is to be divided into just two 
non-intersecting subsets such that the number of edges 
running between the subsets is minimized. 

We begin by defining the adjacency matrix A to be 
the matrix with elements 

I if there is an edge joining vertices i, j, 
An = 1 (1) 
I otherwise. 

(We restrict our attention in this paper to undirected 
networks, so that A is symmetric.) Then the number of 
edges R running between our two groups of vertices, also 
called the cut size, is given by 

R=l E ( 2 ) 

i,j in 
different 
groups 

where the factor of i compensates for our counting each 
edge twice in the sum. 

To put this in a more convenient form, we define an 
index vector s with n elements 

{+1 if vertex i belongs to group I, 
(3) 
— I if vertex i belongs to group 2. 

(Note that s satisfies the normalization condition s T s = 
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n.) Then 

1 . J 1 if z and j are in different groups, 

I if i and j are in the same group, 

(4) 

which allows us to rewrite Eq. © as 

R =lJL( 1 - s i s Mii- (5) 

ij 

Noting that the number of edges fcj connected to a ver- 
tex i — also called the degree of the vertex — is given by 

j 

the first term of the sum in JSJ is 

2^ ^ = X ^ = X S * ^ = X s i s jki$ij j (7) 
ij i i ij 

where we have made use of sf = 1 (since Si — ±1), and 
Sij is 1 if i = j and zero otherwise. Thus 

R= SiSj(kiSij - Ay). (8) 
y 

We can write this in matrix form as 

i? = ±s T Ls, (9) 

where L is the real symmetric matrix with elements Lij — 
kiSij — Aij, or equivalently 1 

{h if i=j, 
— 1 if i ^ j and there is an edge (10) 
otherwise. 

L is called the Laplacian matrix of the graph or some- 
times the admittance matrix. It appears in many contexts 
in the theory of networks, such as the analysis of diffusion 
and random walks on networks |45j . K irchhoff 's theorem 
for the number of spanning trees |46|. and the dynam- 
ics of coupled oscillators 0, 0] . Its properties are the 
subject of hundreds of papers in the mathematics and 
physics literature and are by now quite well understood. 
For our purposes, however, we will need only a few simple 
observations about the matrix to make progress. 

Our task is to choose the vector s so as to minimize the 
cut size, Eq. 0. Let us write s as a linear combination 
of the normalized eigenvectors Vj of the Laplacian thus: 



We assume here that the network is a simple graph, having at 
most one edge between any pair of vertices and no self-edges 
(edges that connect vertices to themselves). 



s = ^r=i a i v *' where = v^s and the normalization 
s T s = n implies that 

n 
i=l 

Then 

R X! "' V ' L X! "' v ' = "<"A'V, = X ""' A '- 

i j ij i 

(12) 

where Xi is the eigenvalue of L corresponding to the 
eigenvector Vj and we have made use of vfvj = Sij. 
Without loss of generality, we assume that the eigenval- 
ues are labeled in increasing order Ai < A2 < • • • < A n . 
The task of minimizing R can then be equated with the 
task of choosing the nonnegative quantities a\ so as to 
place as much as possible of the weight in the sum i|12fl in 
the terms corresponding to the lowest eigenvalues, and as 
little as possible in the terms corresponding to the high- 
est, while respecting the normalization constraint jP) . 

The sum of every row (and column) of the Laplacian 
matrix is zero: 

X Uj = X)<Mj - A a) = k *~ k * = °> ( 13 ) 

where we have made use of J^J. Thus the vector 
(1, 1, 1, . . .) is always an eigenvector of the Laplacian with 
eigenvalue zero. It is less trivial, but still straightfor- 
ward, to demonstrate that all eigenvalues of the Lapla- 
cian are nonnegative. (The Laplacian is symmetric and 
equal to the square of the edge incidence matrix, and 
hence its eigenvalues are all the squares of real vectors.) 
Thus the eigenvalue is always the smallest eigenvalue 
of the Laplacian and the corresponding eigenvector is 
vi = (1, 1, 1, . . -)/y/n, correctly normalized. 

Given these observations it is now straightforward to 
see how to minimize the cut size R. If we choose 
s = (1, 1,1,.. .), then all of the weight in the final sum 
in Eq. I|12|) is in the term corresponding to the lowest 
eigenvalue Ai = and all other terms are zero, since 
(1, 1, 1, . . .) is an eigenvector and the eigenvectors are or- 
thogonal. Thus this choice gives us R — 0, which is the 
smallest value it can take since it is by definition a non- 
negative quantity. 

Unfortunately, when we consider the physical inter- 
pretation of this solution, we see that it is trivial and 
uninteresting. Given the definition 10 of s, the choice 
s = (1,1,1,...) is equivalent to placing all the vertices in 
group 1 and none of them in group 2. Technically, this is 
a valid division of the network, but it is not a useful one. 
Of course the cut size is zero if we put all the vertices 
in one of the groups and none in the other, but such a 
trivial solution tells us nothing about how to solve our 
original problem. 

We would like to forbid this trivial solution, so as to 
force the method to find a nontrivial one. A variety of 
ways have been explored for achieving this goal, of which 



the most common is to fix the sizes of the two groups, 
which is convenient if, as discussed above, the sizes of the 
groups are specified anyway as a part of the problem. In 
the present case, fixing the sizes of the groups fixes the 
coefficient a\ of the Ai term in the sum in Eq. I|12(l : if the 
required sizes of the groups are n\ and n 2 , then 



(a) 



vis = 



(ni - n 2 ) 2 



(14) 



Since we cannot vary this coefficient, we shift our atten- 
tion to the other terms in the sum. If there were no 
further constraints on our choice of s, apart from the 
normalization condition s T s = n, our course would be 
clear: R would be minimized by choosing s proportional 
to the second eigenvector V2 of the Laplacian, also called 
the Fiedler vector. This choice places all of the weight in 
Eq. I|12|) in the term involving the second-smallest eigen- 
value A2, also known as the algebraic connectivity. The 
other terms would automatically be zero, since the eigen- 
vectors are orthogonal. 

Unfortunately, there is an additional constraint on s 
imposed by the condition, Eq. that its elements take 
the values ±1, which means in most cases that s cannot 
be chosen parallel to V2. This makes the optimization 
problem much more difficult. Often, however, quite good 
approximate solutions can be obtained by choosing s to 
be as close to parallel with V2 as possible. This means 
maximizing the quantity 



V, s 



E^ 



< 



El 



,( 2 ) 



(15) 



where is the zth element of V2. Here the second 
relation follows via the triangle inequality, and becomes 
an equality only when all terms in the first sum are posi- 
tive (or negative). In other words, the maximum of |v|"s| 

(2) 

is achieved when Si > for all i, or equivalently when 

Si has the same sign as v\ 
tained with the choice 



(2) 



Thus the maximum is ob- 



if vf ] > 0, 
if vP < 0. 



(16) 



Even this choice however is often forbidden by the con- 
dition that the number of +1 and —1 elements of s be 
equal to the desired sizes ni and n 2 of the two groups, in 
which case the best solution is achieved by assigning ver- 
tices to one of the groups in order of the elements in the 
Fiedler vector, from most positive to most negative, until 
the groups have the required sizes. For groups of differ- 
ent sizes there are two distinct ways of doing this, one in 
which the smaller group corresponds to the most positive 
elements of the vector and one in which the larger group 
does. We can choose between them by calculating the 
cut size R for both cases and keeping the one that gives 
the better result. 

This then is the spectral partitioning method in its 
simplest form. It is not guaranteed to minimize R, 




(b) 




FIG. 1 (a) The mesh network of Bern et al. (b) The 

best division into equal-sized parts found by the spectral par- 
titioning algorithm based on the Laplacian matrix. 



but, particularly in cases where A2 is well separated 
from the eigenvalues above it, it often does very well. 
Figure ^ shows an example application typical of those 
found in the literature, to a two-dimensional mesh such 
as might be used in parallel finite-clement calculations. 
This particular mesh is a small 547-vertex example from 
Bern et al. and is shown complete in panel (a) of 
the figure. Panel (b) shows the division of the mesh into 
two parts of 273 and 274 vertices respectively using the 
spectral partitioning approach, which finds a cut of size 
46 edges in this case. 

Although the cut found in this example is a reasonable 
one, it does not appear — at least to this author's eye — 
that the vertex groups in Fig. \T]p constitute any kind of 
natural division of the network into "communities." This 
is typical of the problems to which spectral partitioning 
is usually applied: in most circumstances the network 
in question does not divide up easily into groups of the 
desired sizes, but one must do the best one can. For these 
types of tasks, spectral partitioning is an effective and 
appropriate tool. The task of finding natural community 
divisions in a network, however, is quite different, and 
demands a different approach, as we now discuss. 



III. COMMUNITY STRUCTURE AND MODULARITY 

Despite its evident success in the graph partitioning 
arena, spectral partitioning is a poor approach for de- 
tecting natural community structure in real-world net- 
works, which is the primary topic of this paper. The is- 
sue is with the condition that the sizes of the groups into 
which the network is divided be fixed. This condition 
is neither appropriate nor realistic for community detec- 
tion problems. In most cases we do not know in advance 
the sizes of the communities in a network and choosing 
arbitrary sizes will usually preclude us from finding the 
best solution to the problem. We would like instead to 
let the group sizes be free, but the spectral partitioning 
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method breaks down if we do this, as we have seen: if 
the group sizes are not fixed, then the minimum cut size 
is always achieved by putting all vertices in one group 
and none in the other. Indeed, this statement is con- 
siderably broader than the spectral partitioning method 
itself, since any method that correctly minimizes the cut 
size without constraint on the group sizes is sure to find, 
in the general case, that the minimum value is achieved 
for this same trivial division. 

Several approaches have been proposed to get around 
this problem. For instance, the ratio cut method [Hoj 
minimizes not the simple cut size R but the ratio 
R/ '(711712), where n\ and n% are again the sizes of the 
two groups of vertices. This penalizes configurations in 
which cither of the groups is small and hence favors bal- 
anced divisions over unbalanced ones, releasing us from 
the obligation to fix the group sizes. Spectral algorithms 
based on ratio cuts have been proposed PHI |§2 an d have 
proved useful for certain classes of partitioning problems. 
Still, however, this approach effectively chooses the group 
sizes, at least approximately, since it is biased in favor of 
divisions into equal-sized parts. Variations are possible 
that are biased towards other, unequal part sizes, but 
then one must choose those parts sizes and so again we 
have a situation in which we need to know in advance the 
sizes of the groups if we are to get the "right" results. The 
ratio cut method does allow some leeway for the sizes to 
vary around their specified values, which makes it more 
flexible than the simple minimum cut method, but at its 
core it still suffers from the same drawbacks that make 
standard spectral partitioning inappropriate for commu- 
nity detection. 

The fundamental problem with all of these methods 
is that cut sizes are simply not the right thing to opti- 
mize because they don't accurately reflect our intuitive 
concept of network communities. A good division of a 
network into communities is not merely one in which 
the number of edges running between groups is small. 
Rather, it is one in which the number of edges between 
groups is smaller than expected. Only if the number of 
between-group edges is significantly lower than would 
be expected purely by chance can we justifiably claim 
to have found significant community structure. Equiva- 
lently, we can examine the number of edges within com- 
munities and look for divisions of the network in which 
this number is higher than expected — the two approaches 
are equivalent since the total number of edges is fixed and 
any edges that do not lie between communities must nec- 
essarily lie inside them. 

These considerations lead us to shift our attention from 
measures based on pure cut size to a modified benefit 
function Q defined by 

Q = (number of edges within communities) 

— (expected number of such edges). (17) 

This benefit function is called modularity [iH l5^ | . It is 
a function of the particular division of the network into 
groups, with larger values indicating stronger community 



structure. Hence we should, in principle, be able to find 
good divisions of a network into communities by opti- 
mizing the modularity over possible divisions. This ap- 
proach, proposed in Il2 li and since pursued by a number 
of authors \&L \32L l.'iSl 139 . liOj , has proven highly effective 
in practice ,36| and is the primary focus of this article. 

The first term in Eq. I|17|) is straightforward to calcu- 
late. The second, however, is rather vague and needs to 
be made more precise before we can evaluate the modu- 
larity. What exactly do we mean by the "expected num- 
ber" of edges within a community? Answering this ques- 
tion is essentially equivalent to choosing a "null model" 
against which to compare our network. The definition 
of the modularity involves a comparison of the number 
of within-group edges in a real network and the number 
in some equivalent randomized model network in which 
edges are placed without regard to community structure. 

It is one of the strengths of the modularity approach 
that it makes the role of this null model explicit and clear. 
All methods for finding communities are, in a sense, as- 
suming some null model, since any method must make a 
value judgment about when a particular density of edges 
is significant enough to define a community. In most 
cases, this assumption is hidden within the workings of a 
computer algorithm and is difficult to disentangle, even 
when the algorithm itself is well understood. By bring- 
ing its assumptions out into the open, the modularity 
method gives us more control over our calculations and 
more understanding of their implications. 

Our null model must have the same number of ver- 
tices n as the original network, so that we can divide 
it into the same groups for comparison, but apart from 
this we have a good deal of freedom about our choice of 
model. We here consider the broad class of randomized 
models in which we specify separately the probability Pjj 
for an edge to fall between every pair of vertices i,j. More 
precisely, Pij is the expected number of edges between i 
and j, a definition that allows for the possibility that 
there may be more than one edge between a pair of ver- 
tices, which happens in certain types of networks. We 
will consider some particular choices of Pij in a moment, 
but for now let us pursue the developments in general 
form. 

Given P^, the modularity can be defined as follows. 
The actual number of edges falling between a particular 
pair of vertices i and j is Aij, Eq. l[T]l. and the expected 
number is, by definition, P^ . Thus the actual minus ex- 
pected number of edges between i and j is — P^ and 
the modularity is (proportional to) the sum of this quan- 
tity over all pairs of vertices belonging to the same com- 
munity. Let us define gi to be the community to which 
vertex i belongs. Then the modularity can be written 

Q = -^ y E[ A *i- p <i] 6 (Si>9i), ( 18 ) 

ij 

where 8(r, s) = 1 if r = s and otherwise and m is again 
the number of edges in the network. The extra factor of 
l/2m in Eq. (|18|) is purely conventional; it is included 
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for compatibility with previous definitions of the modu- 
larity and plays no part in the maximization of Q since 
it is a constant for any given network. A special case 
of Eq. I|18|) was given previously by the present author 
in | 54| and independently, in slightly different form, by 
White and Smyth 55]. A number of other expressions 
for the modularity have also been presented by various 
authors 18, 39, 40] and are convenient in particular ap- 
plications. Also of interest is the derivation of the mod- 
ularity given recently by Reichardt and Bornholdt (34[, 
which is quite general and provides an interesting alter- 
native to the derivation presented here. 

Returning to the null model, how should Pij be cho- 
sen? The choice is not entirely unconstrained. First, we 
consider in this paper only undirected networks, which 
implies that P+j = Pji. Second, it is axiomatically the 
case that Q = when all vertices are placed in a single 
group together: by definition, the number of edges within 
groups and the expected number of such edges are both 
equal to m in this case. Setting all gi equal in Eq. lfT5)l. 
we find that J2ij[Aij ~ Pij] = or equivalently 

Y f P ij =Y f A ij = 2m. (19) 

ij ij 

This equation says that we are restricted to null models 
in which the expected number of edges in the entire net- 
work equals the actual number of edges in the original 
network — a natural choice if our comparison of numbers 
of edges within groups is to have any meaning. 

Beyond these basic considerations, there are many pos- 
sible choices of null model and several have been consid- 
ered previously in the literature 0, 153, E(j- Perhaps 
the simplest is the standard (Bernoulli) random graph, 
in which edges appear with equal probability P^ = p be- 
tween all vertex pairs. With a suitably chosen value of 
p this model can be made to satisfy l|19|) but, as many 
authors have pointed out |57], l58l l59| , the model is not 
a good representation of most real- world networks. A 
particularly glaring aspect in which it errs is its degree 
distribution. The random graph has a binomial degree 
distribution (or Poisson in the limit of large graph size), 
which is entirely unlike the right-skewed de gree distribu- 
tions found in most real- world networks |60tl6l| . A much 
better null model would be one in which the degree dis- 
tribution is approximately the same as that of the real- 
world network of interest. To satisfy this demand we will 
restrict our attention in this paper to models in which 
the expected degree of each vertex within the model is 
equal to the actual degree of the corresponding vertex 
in the real network. Noting that the expected degree of 
vertex i is given by J2j Pij i we can express this condition 
as 

»<.• >•■>■ ( 2 °) 

j 

If this constraint is satisfied, then i|19|) is automatically 
satisfied as well, since fcj = 2m. 



Equation (|20|l is a considerably more stringent con- 
straint than l|19f) — in most cases, for instance, it excludes 
the Bernoulli random graph — but it is one that we believe 
makes good sense, and one moreover that has a variety of 
desirable consequences for the developments that follow. 

The simplest null model in this class, and the only one 
that has been considered at any length in the past, is 
the model in which edges are placed entirely at random, 
subject to the constraint (|20|l . That is, the probability 
that an end of a randomly chosen edge attaches to a par- 
ticular vertex i depends only on the expected degree ki 
of that vertex, and the probabilities for the two ends of a 
single edge are independent of one another. This implies 
that the expected number of edges P^ between vertices i 
and j is the product f(ki)f(kj) of separate functions of 
the two degrees, where the functions must be the same 
since Pij is symmetric. Then Eq. (|20() implies 

n n 

Y,p. 3 = f{h)Y.f( k i) = k ^ ( 21 ) 

i=i i=i 

for all i and hence f(ki) = Cki for some constant C . And 
Eq. l(T§|) says that 

2m = ^2 p ij = C 2 klk i = ( 2mC ) 2 1 ( 22 ) 

ij ij 

and hence C — l/y/2m and 

P«=«i. (23) 

This model has been studied in the past in its own right as 
a model of a network, for instance by Chung and Lu |62| . 
It is also closely related to the configuration model, which 
has been studied widely in the mathematics and physics 
literature [6^, E3, E3> E3 Indeed, essentially all expected 
properties of our model and the configuration model are 
identical in the limit of large network size, and hence 
Eq. (|23|l can be considered equivalent to the configuration 
model in this limit. 2 

Although many of the developments outlined in this 
paper are true for quite general choices of the null model 
used to define the modularity, the choice Q23[) is the only 
one we will pursue here. It is worth keeping mind however 



2 The technical difference between the two models is that the con- 
figuration model is a random multigraph conditioned on the ac- 
tual degree sequence, while the model used here is a random 
multigraph conditioned on the expected degree sequence. This 
makes the ensemble of the former considerably smaller than that 
of the latter, but the difference is analogous to the difference 
between canonical and grand canonical ensembles in statistical 
mechanics and the two give the same answers in the thermody- 
namic limit for roughly the same reason. In particular, we note 
that the probability of an edge falling between two vertices i and 
j in the configuration model is also given by Eq. 1231 in the limit 
of large network size; for smaller networks, there are corrections 
of order 1/n. 
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that other choices are possible: Massen and Doye |5q|. for 
instance, have used a variant of the configuration model 
in which multiedges and self-edges were excluded. And 
further choices could be useful in specific cases, such as 
cases where there are strong correlations between the de- 
grees of vertices |HE E3 or where there is a high level of 
network transitivity |59| . 



IV. SPECTRAL OPTIMIZATION OF MODULARITY 

Once we have an explicit expression for the modularity 
we can determine the community structure by maximiz- 
ing it over possible divisions of the network. Unfortu- 
nately, exhaustive maximization over all possible divi- 
sions is computational intractable because there are sim- 
ply too many divisions, but various approximate opti- 
mization methods have proven effective [3, |2J, yj, |33, IH, 
0>|5i|- Here, we develop a matrix-based approach anal- 
ogous to the spectral partitioning method of Section [HI 
which leads not only to a whole array of possible opti- 
mization algorithms but also to new insights about the 
nature and implications of community structure in net- 
works. 



A. Leading eigenvector method 

As before, let us consider initially the division of a 
network into just two communities and denote a potential 
such division by an index vector s with elements as in 
Eq. J2J). We notice that the quantity \{siSj + 1) is 1 if i 
and j belong to the same group and if they belong to 
different groups or, in the notation of Eq. (1181) . 



$(9i,9j) = \{s.iSj + 1). 
Thus we can write <|18|) in the form 

— ^2 [Aij - Pij] SiSj, 



(24) 



(25) 



where we have in the second line made use of Eq. Q19[l. 
This result can conveniently be rewritten in matrix form 
as 



1 rp 

Q = ^s T Bs, 

4m 



(26) 



where B is the real symmetric matrix having elements 



Bij — Ajj /' 



(27) 



We call this matrix the modularity matrix and it plays 
a role in the maximization of the modularity equivalent 
to that played by the Laplacian in standard spectral par- 
titioning: Equation (|26[1 is the equivalent of Eq. for 



the cut size and matrix methods can thus be applied to 
the modularity that are the direct equivalents of those 
developed for spectral partitioning, as we now show. 

First, let us point out a few important properties of the 
modularity matrix. Equations (JSJ) and (|20() together im- 
ply that all rows (and columns) of the modularity matrix 



sum to zero: 



— 22 Aij — E — ki — ki — 0. 



(28) 



This immediately implies that for any network the vec- 
tor (1, 1, 1, . . .) is an eigenvector of the modularity matrix 
with eigenvalue zero, just as is the case with the Lapla- 
cian. Unlike the Laplacian however, the eigenvalues of 
the modularity matrix are not necessarily all of one sign 
and in practice the matrix usually has both positive and 
negative eigenvalues. This observation — and the eigen- 
spcctrum of the modularity matrix in general — are, as 
we will see, closely tied to the community structure of 
the network. 

Working from Eq. i|26|) we now proceed by direct anal- 
ogy with Section|nJ We write s as a linear combination of 
the normalized eigenvectors of the modularity matrix, 
s = 2^i=i a i u i w hh ai = ujs. Then 



Q 



l 

4m 



(29) 



where (3i is the eigenvalue of B corresponding to the 
eigenvector Uj. We now assume that the eigenvalues are 
labeled in decreasing order j3i > /?2 > • ■ ■ > /?« an d the 
task of maximizing Q is one of choosing the quantities af 
so as to place as much as possible of the weight in the 
sum ()29[) in the terms corresponding to the largest (most 
positive) eigenvalues. 

As with ordinary spectral partitioning, this would be a 
simple task if our choice of s were unconstrained (apart 
from normalization): we would just choose s proportional 
to the leading eigenvector Ui of the modularity matrix. 
But the elements of s are restricted to the values s, = ±1, 
which means that s cannot normally be chosen parallel 
to ui. Again as before, however, good approximate so- 
lutions can be obtained by choosing s to be as close to 
parallel with Ui as possible, which is achieved by setting 



-1 if > 0, 



-1 



if < 0. 



(30) 



This then is our first and simplest algorithm for com- 
munity detection: we find the eigenvector corresponding 
to the most positive eigenvalue of the modularity matrix 
and divide the network into two groups according to the 
signs of the elements of this vector. 

In practice, this method works nicely, as discussed 
in [32| . Making the choice (|23|l for our null model, we 
have applied it to a variety of standard and less stan- 
dard test networks and find that it does a good job of 
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Fig.Olwhi 
Ref. |32. 



modularity method 



spectral partitioning 



FIG. 2 The dolphin social network of Lusseau et al. [6S| . The 
dashed curve represents the division into two equally sized 
parts found by a standard spectral partitioning calculation 
(Section US. The solid curve represents the division found 
by the modularity-based method of this section. And the 
squares and circles represent the actual division of the net- 
work observed when the dolphin community split into two 
as a result of the departure of a keystone individual. (The 
individual who departed is represented by the triangle.) 



which depicts the network of political books from 
This network, compiled by V. Krebs (un- 
published), represents recent books on US politics, with 
edges connecting pairs of books that are frequently pur- 
chased by the same customers of the on-line bookseller 
Amazon.com. Applying our method, we find that the 
network divides as shown in the figure, with the colors 
of the vertices representing the values of the elements of 
the eigenvector. The two groups correspond closely to 
the apparent alignment of the books according to left- 
wing and right-wing points of view ^S^, and are sugges- 
tively colored blue and red in the figure. 3 The most blue 
and most red vertices are those that, by our calculation, 
belong most strongly to the two groups and are thus, per- 
haps, the "most left-wing" and "most right-wing" of the 
books under consideration. Those familiar with current 
US politics will be unsurprised to learn that the most left- 
wing book in this sense was the polemical Bushwacked by 
Molly Ivins and Lou Dubose. Perhaps more surprising is 
the most right-wing book: A National Party No More by 
Zell Miller. 4 



finding community divisions. Figure |21 shows a represen- 
tative example, an animal social network assembled and 
studied by Lusseau et al. H3- The vertices in this net- 
work represent 62 bottlenose dolphins living in Doubtful 
Sound, New Zealand, with social ties between dolphin 
pairs established by direct observation over a period of 
several years. This network is of particular interest be- 
cause, during the course of the study, the dolphin group 
split into two smaller subgroups following the departure 
of a key member of the population. The subgroups are 
represented by the shapes of the vertices in the figure. 
The dotted line denotes the division of the network into 
two equal-sized groups found by the standard spectral 
partitioning method. While, as expected, this method 
does a creditable job of dividing the network into groups 
of these particular sizes, it is clear to the eye that this is 
not the natural community division of the network and 
neither does it correspond to the division observed in 
real life. The spectral partitioning method is hamstrung 
by the requirement that we specify the sizes of the two 
communities; unless we know what they are in advance, 
blind application of the method will not usually find the 
"right" division of the network. 

The method based on the leading eigenvector of the 
modularity matrix, however, does much better. Uncon- 
strained by the need to find groups of any particular size, 
this method finds the division denoted by the solid line 
in the figure, which, as we see, corresponds quite closely 
to the split actually observed — all but three of the 62 
dolphins are placed in the correct groups. 

The magnitudes of the elements of the eigenvector Ui 
also contain useful information about the network, indi- 
cating, as discussed in |32j, the "strength" with which 
vertices belong to the communities in which they are 
placed. As an example of this phenomenon consider 



B. Other eigenvectors of the modularity matrix 

The algorithm described in the previous section has 
two obvious shortcomings. First, it divides networks into 
only two communities, while real- world networks can cer- 
tainly have more than two. Second, it makes use only 
of the leading eigenvector of the modularity matrix and 
ignores all the others, which throws away useful infor- 
mation contained in those other vectors. Both of these 
shortcomings are remedied by the following generaliza- 
tion of the method. 

Consider the division of a network into c non- 
overlapping communities, where c may now be greater 
than 2. Following Alpert and Yao [6!| and more re- 
cently White and Smyth [5{|, let us define an n x c 
index matrix S with one column for each community: 



3 By a fluke of recent history, the colors blue and red have come to 
denote liberal and conservative points of view respectively in US 
politics, where in most other parts of the world the color-scheme 
is the other way around. 

4 Miller is a former Democratic (i.e., ostensibly liberal) governor 
and US senator for the state of Georgia. He became known 
in the later years of his career, however, for views that aligned 
more closely with the conservative Republicans than with the 
Democrats. Even so, Miller was never the most conservative 
member of the senate, nor is his book the most conservative in 
this study. But our measure is not based on the content of the 
books; it merely finds the vertices in the network that are most 
central to their communities. The ranking of Miller's book in 
this calculation results from its centrality within the community 
of conservative book buying. This book, while not in fact as 
right-wing as some, apparently appeals widely and exclusively 
to conservatives, presumably because of the unusual standing of 
its author as a nominal Democrat supporting the Republican 
cause. 
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S = (si |s2 1 . . . |s c ). Each column is an index vector now 
of (0,1) elements (rather than ±1 as previously), such 
that 



1 if vertex i belongs to community j, 







otherwise. 



(31) 



Note that the columns of S are mutually orthogonal, that 
the rows each sum to unity, and that the matrix satisfies 
the normalization condition Tr(S T S) = n. 

Observing that the (5-symbol in Eq. I|18|) is now given 

by 



(32) 



the modularity for this division of the network is 



— l k—1 



S tk S ]k = Tr(S T BS), (33) 



where here and henceforth we suppress the leading mul- 
tiplicative constant l/2m from Eq. I|18|l . which has no 
effect on the position of the maximum of the modularity. 

Writing B = UDU T , where U = (ui|u 2 | . . .) is the 
matrix of eigenvectors of B and D is the diagonal matrix 
of eigenvalues Da — Pi, we then find that 



(34) 



j=l k=l 



Again we wish to maximize this modularity, but now we 
have no constraint on the number c of communities; we 
can give S as many columns as we like in our effort to 
make Q as large as possible. 

If the elements of the matrix S were unconstrained 
apart from the basic conditions on the rows and columns 
mentioned above, a choice of c communities would be 



equivalent to choosing c — 1 independent, mutually or- 
thogonal columns Si . . . s c _i. (Only c — 1 of the columns 
are independent, the last being fixed by the condition 
that the rows of S sum to unity.) In this case our path 
would be clear: Q would be maximized by choosing the 
columns proportional to the leading eigenvectors of B. 
However, only those eigenvectors corresponding to pos- 
itive eigenvalues can give positive contributions to the 
modularity, so the optimal modularity would be achieved 
by choosing exactly as many independent columns of S as 
there are positive eigenvalues, or equivalently by choosing 
the number of groups c to be 1 greater than the number 
of positive eigenvalues. 

Unfortunately, our problem has the additional con- 
straint that the index vectors have only binary (0,1) 
elements, which means it may not be possible to find as 
many index vectors making positive contributions to the 
modularity as the set of positive eigenvalues suggests. 
Thus the number of positive eigenvalues, plus 1, is an 
upper bound on the number of communities and again 
we see that there is an intimate connection between the 
properties of the modularity matrix and the community 
structure of the network it describes. 



C. Vector partitioning algorithm 



In Section flV.AI we maximized the modularity approx- 
imately by focusing solely on the term in Q proportional 
to the largest eigenvalue of B . Let us now make the more 
general (and often better) approximation of keeping the 
leading p eigenvalues, where p may be anywhere between 
1 and n. Some of the eigenvalues, however, may be neg- 
ative, which will prove inconvenient. To get around this 
we rewrite Eq. I|33|l thus: 



Q = na + Tr[S T U(D - aI)U T S] 

n c r n 

j=l k=l L i=1 



(35) 
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where a is a constant whose value we will choose shortly 
and we have made use of Tr(S T S) = n and the fact that 
U is orthogonal. 

Now, employing an argument similar to that used for 
ordinary spectral partitioning in 69], let us define a set 
of vertex vectors Yi, i — 1 . . . n, of dimension p, such that 
the jth component of the ith vector is 



[r«] =y/fr=^Ui. 



(36) 



Provided we choose a < (3 P , is guaranteed real for 
all i. Then, dropping terms in Ij35|l proportional to the 
smallest n — p of the factors f3j — a, we have 



Q 



na 



na 



na 



j=l k=l L i=l 

C P r 

■EE EN 

fe=i j=i HeG h 

■£|R fe | 2 , 
fc=l 



aUijSik 



(37) 



where Gk is the set of vertices comprising group k and 
the community vectors Rj, , k = 1 . . . c, are 



R 



iGG fc 



(38) 



The community structure problem is now equivalent 
to choosing a division of the vertices into groups so as to 
maximize the magnitudes of the vectors Rfe. This means 
we need to arrange that the individual vertex vectors 
going into each group point in approximately the same 
direction. Problems of this type are called vector parti- 
tioning problems. 

The parameter p controls the balance between the com- 
plexity of the vector partitioning problem and the accu- 
racy of the approximation we make by keeping only some 
of the eigenvalues. The calculations will be faster but less 
accurate for smaller p and slower but more accurate for 
larger. For the special case p — n where we keep all of 
the eigenvalues, Eq. I|37|l is exact. In this case, we note 
that the vertex vectors have the property 



r r, 



k=l 



Uik(Pk - a)Ujk = Bij - a5i 



(39) 



It's then simple to see that Eq. (|37|l is trivially equiva- 
lent to the fundamental definition (|18(l of the modularity, 
so in the p = n case our mapping to a vector parti- 
tioning problem gives little insight into the modularity 
maximization problem. The real advantage of our ap- 
proach comes when p < n, where the method extracts 
precisely those factors that make the principal contribu- 
tions to the modularity — i.e., those corresponding to the 
largest eigenvalues — discarding those that have relatively 



little effect. In practice, as we have seen for the single- 
eigenvector algorithm, the main features of the commu- 
nity structure are often captured by just the first eigen- 
vector or perhaps the first few, which allows us to reduce 
the complexity of our optimization problem immensely. 

The approach is similar in concept to the standard 
technique of principal components analysis (PCA) used 
to reduce high-dimensional data sets to manageably 
small dimension by focusing on the eigendirections along 
which the variance about the mean is greatest and ignor- 
ing directions that contribute little. In fact, this simi- 
larity is more than skin-deep: the form of our modular- 
ity matrix is closely analogous to the covariance matrix 
whose eigenvectors are the basis for PCA. The elements 
of the covariance matrix are correlation functions of the 
form (xy) — (x) (y) , where x and y denote measured vari- 
ables in the data set. Thus the covariance is the differ- 
ence between the actual value of the mean product {xy) 
of two variables and the value (x) (y) expected by chance 
for that product if the variables were uncorrelated. Simi- 
larly, the elements Bij — Aij — kikj /2m of the modularity 
matrix are equal to the actual number of edges Aij be- 
tween a given pair of vertices minus the number /2m 
expected by chance, expressed in a product form. In 
a sense, our spectral method for modularity optimiza- 
tion can be regarded as a "principal components analysis 
for networks." This aspect of the method is clear, for 
instance, in the study of political books represented in 
Fig. |3 the leading eigenvector used to assign the colors 
to the vertices in the figure is playing a role equivalent 
to the eigendirections in PCA, defining a "direction of 
greatest variation" in the structure of the network. The 
vertex vectors of Eq. 1|36[) are similarly analogous to the 
low-dimensional projections used in PCA. 5 

Returning to our algorithm, let us consider again the 
special case of the division of a network into just two 
communities. (Multi-way division is considered in Sec- 
tion El) Since (1, 1, 1, ...) is always an eigenvector of 
the modularity matrix and the eigenvectors are orthog- 
onal, the elements of all other eigenvectors must sum to 
zero: 



EKli = Vnujuj = 0. 



(40) 



But Eq. (|36|l then implies that 



i— 1 i—1 i—1 

(41) 



5 This suggests, for instance, that the vertex vectors for p = 2 
or 3 could be used to define graph layouts for visualizing net- 
works in 2 or 3 dimensions. Either the endpoints of the vectors 
could define vertex positions themselves or they could be used 
as starting positions for a spring embedding visualizer or other 
more conventional layout scheme. 
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and hence 

n 

for any value of p. This in turn implies that the commu- 
nity vectors Rfc also sum to zero: 

fe=l fc=i ieGfc i=l 

And as a special case of this last result, any division of 
a network into two communities has community vectors 
Ri and R2 that are of equal magnitude and oppositely 
directed. 

Furthermore, the maximum of the modularity, 
Eq. (|37l) . is always achieved when each individual ver- 
tex vector Yi has a positive inner product with the com- 
munity vector of the community to which the vertex be- 
longs. To see this, observe that removing a vertex i from 
a community k where Rfc • < produces a change in 
the corresponding term |Rfc| 2 in Eq. i|37|) of 

|R fc - r 2 | 2 - |Rfc| 2 = | ri | 2 - 2R fe ■ r, > 0. (44) 

Similarly adding vertex i to a community for which 
Rfc ■ r i > also increases |Rfc| 2 . Hence, we can always 
increase the modularity by moving vertices until they are 
in groups such that Rfc • > 0. 

Taken together, these results imply that possible can- 
didates for the optimal division of a network into two 
groups arc fully specified by just the direction of the sin- 
gle vector Ri . Once we have this direction, we know that 
the vertices divide according to whether their projection 
along this direction is positive or negative. Alternatively, 
we can consider the direction of Ri to define a perpendic- 
ular plane through the origin in the p-dimensional vector 
space occupied by the vertex vectors r^. The vertices 
then divide according to which side of this plane their 
vectors fall on. Finding the maximum of the modular- 
ity is then a matter of choosing this bisecting plane to 
maximize the magnitude of Ri . 

In general, this still leaves us with a moderately dif- 
ficult optimization problem: the number of bisecting 
planes that give distinct partitions of the vertex vectors 
is large and difficult to enumerate as the dimension p 
of the space becomes large. For the case p = 2, how- 
ever, a relatively simple solution exists. Consider Fig. 
which shows a typical example of the vertex vectors. 6 
In this two-dimensional case, there are only n topologi- 
cally distinct choices of the bisecting plane (actually just 
a line in this case, denoted by the dashed line in the fig- 
ure), and furthermore the divisions of the vertices that 
these choices represent change by only a single vertex 



In fact, this figure shows the vectors for the "karate club" network 
used previously as an example in Ref. |32| . 




FIG. 4 A plot of the vertex vectors for a small network 
with p — 2. The dotted line represents one of the n possible 
topologically distinct cut planes. 



at a time as we rotate the plane about the origin. This 
makes it computationally simple to perform the rotation, 
keep track of the value of Ri , and so find the maximum 
of the modularity within this approximation. Evaluat- 
ing the magnitude of Ri involves a constant number of 
operations each time we move the line, and hence the 
total work involved in finding the maximum is 0(n) for 
all n possible positions, which is the same as the O(n) 
operations needed to separate the vertices in the p = 1 
case. 

For p > 2, we do not know of an efficient method to 
enumerate exhaustively all topologically distinct bisect- 
ing planes in the vertex vector space, and hence we have 
to turn to approximate methods for solving the vector 
partitioning problem. A number of reasonable heuristics 
have been described in the past. We have found accept- 
able though not spectacular results, for instance, with the 
"MELO" algorithm of [6^, which is essentially a greedy 
algorithm in which a grouping of vectors is built up by 
repeatedly adding to it the vector that makes the largest 
contribution to Q. 



D. Choice of a 

Before implementing any of these methods, a crucial 
question we must answer is what value we should choose 
for the parameter a. By tuning this value we can improve 
the accuracy of our approximation to Q as follows. 

By dropping the n— p most negative eigenvalues, we are 
in effect making an approximation to the matrix B — al 
in which it takes not its full value U(D — aI)U T , but an 
approximate value U(D' — aI')U T , where D' and I' are 
the matrices D and I with the last n—p diagonal elements 
set to zero. We can quantify the error this introduces by 
calculating the sum of the squares of the elements of the 
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difference between the two matrices, which is given by 



X 2 = Tr[U(D - aI)U T - U(D' - aI')U T ] 2 

n 

= Tr[(D - al) - (D' - at)] 2 = £ (A - a) 2 , (45) 

i=p+l 

where in the second line we have made use of the fact 
that U is orthogonal. 

Minimizing this error by setting the derivative 
d% 2 /da = 0, we find 



1 



n — p 



i=p+l 



(46) 



In other words, the minimal mean square error intro- 
duced by our approximation is achieved by setting a 
equal to the mean of the eigenvalues that have been 
dropped. The only exception is when p ~ n, where the 
choice of a makes no difference since no approximation 
is being made anyway. In our calculations we have used 
a = (3 n in this case, but any choice a > (3 n would work 
equally well. 



V. IMPLEMENTATION 



vector x by the modularity matrix. This presents a prob- 
lem because the modularity matrix is dense, and hence it 
appears that matrix multiplications will demand 0(n 2 ) 
time each, where n is, as before, the number of vertices 
in the network (which is also the size of the matrix). By 
contrast, the equivalent calculation in standard spectral 
partitioning is much faster because the Laplacian matrix 
is sparse, having only 0(n + m) nonzero elements, where 
m is the number of edges in the network. 

For the standard choice, Eq. of null model used to 
define the modularity, however, it turns out that we can 
multiply by the modularity matrix just as fast as by the 
Laplacian by making use of the special structure of the 
matrix. In vector notation the modularity matrix can in 
this case be written 



B 



2m 



(47) 



where A is the adjacency matrix, Eq. and k is the 
n-element vector whose elements are the degrees ki of the 
vertices. Then 



k T x 

Bx = Ax - — k. 

2m 



(48) 



Implementation of the methods described in Section HVl 
is straightforward. The leading-eigenvector method of 
Section IIV.AI requires us to find only the single eigen- 
vector of the modularity matrix B corresponding to the 
most positive eigenvalue. This is most efficiently achieved 
by the direct multiplication or power method. Starting 
with a trial vector, we repeatedly multiply by the modu- 
larity matrix and — unless we are unlucky enough to have 
chosen another eigenvector as our trial vector — the re- 
sult will converge to the eigenvector of the matrix having 
the eigenvalue of largest magnitude. In some cases this 
eigenvalue will be the most positive one, in which case 
our calculation ends at this point. In other cases the 
eigenvalue of largest magnitude may be negative. If this 
happens then, denoting this eigenvalue by [3 n , we calcu- 
late the shifted matrix B — j3 n I, which has eigenvalues 
Pi — Pn (necessarily all nonnegative) and the same eigen- 
vectors as the modularity matrix itself. Then we repeat 
the power-method calculation for this new matrix and 
this time the eigenvalue of largest magnitude must be 
Pi — (3 n and the corresponding eigenvector is the one we 
are looking for. 

For the method of Section llV.BI we require either all of 
the eigenvectors of the modularity matrix or a subset cor- 
responding to the p most positive eigenvalues. These are 
most conveniently calculated using the Lanczos method 
or one of its variants |70|. The fundamental matrix op- 
eration at the heart of the Lanczos method is again mul- 
tiplication of the matrix B into a trial vector. 

Efficient implementation of any of these methods thus 
rests upon our ability to rapidly multiply an arbitrary 



Since the adjacency matrix is sparse, having only 0(m) 
elements, the first term can be evaluated in O(m) time, 
while the second requires us to evaluate the inner product 
k T x only once and then multiply it into each element of k 
in turn, both operations taking O(n) time. Thus the en- 
tire matrix multiplication can be completed in 0(m + n) 
time, just as with the normal Laplacian matrix. If a 
shift of the eigenvalues is required to find the most posi- 
tive one, as described above, then there is an additional 
term — f3 n l in the matrix, but this also can be multiplied 
into an arbitrary vector in O(n) time, so again the entire 
operation can be completed in 0(m + n) time. 

Typically O(n) matrix multiplications are required for 
either the power method or the Lanczos method to con- 
verge to the required eigenvalues, and hence the calcula- 
tion takes 0((m + n)n) time overall. In the common case 
in which the network is sparse and m oc n, this simplifies 
to 0(n 2 ). 

While this is, essentially, the end of the calculation for 
the power method, the Lanczos method unfortunately 
demands more effort to find the eigenvectors themselves. 
In fact, it takes 0(n 3 ) time to find all eigenvectors of a 
matrix using the Lanczos method, which is quite slow. 
There are on the other hand variants of the Lanczos 
method (as well as other methods entirely) that can find 
just a few leading eigenvectors faster than this, which 
makes calculations that focus on a fixed small number of 
eigenvectors preferable to ones that use all eigenvectors. 
In our calculations we have primarily concentrated on al- 
gorithms that use only one or two eigenvectors, which 
typically run in time 0(n 2 ) on a sparse network. 



13 



A. Refinement of the modularity 

The methods for spectral optimization of the modular- 
ity described in Section lTVI are only approximate. Indeed, 
the problem of modularity optimization is formally equiv- 
alent to an instance of the NP-hard MAX-CUT problem, 
so it is almost certainly the case that no polynomial-time 
algorithm exists that will find the modularity optimum 
in all cases. Given that the algorithms we have described 
run in polynomial time, it follows that they must fail to 
find the optimum in some cases, and hence that there is 
room for improvement of the results. 

In standard graph partitioning applications it is com- 
mon to use a spectral approach based on the graph Lapla- 
cian as a first pass at the problem of dividing a network. 
The spectral method gives a broad picture of the general 
shape the division should take, but there is often room 
for improvement. Typically another algorithm, such as 
the Kernighan-Lin algorithm |7lj . which swaps vertex 
pairs between groups in an effort to reduce the cut size, 
is used to refine this first pass, and the resulting two- 
stage joint strategy gives considerably better results than 
either stage on its own. 

We have found that a similar joint strategy gives good 
results in the present case also: the divisions found 
with our spectral approach can be improved in small 
but significant ways by adding a refinement step akin 
to the Kernighan-Lin algorithm. As described in [32| . 
we take an initial division into two communities derived, 
for instance, from the leading-eigenvector method of Sec- 
tion and move single vertices between the commu- 
nities so as to increase the value of the modularity as 
much as possible, with the constraint that each vertex 
can be moved only once. Repeating the whole process 
iteratively until no further improvement is obtained, we 
find a final value of the modularity which can improve 
on that derived from the spectral method alone by tens 
of percent in some cases, and smaller but still significant 
amounts in other cases. Although the absolute gains in 
modularity are not always large, we find that this refine- 
ment step is very much worth the effort it entails, raising 
the typical level of performance of our methods from the 
merely good to the excellent, when compared with other 
algorithms. Specific examples are given in |32|. 

It is certainly possible that other refinement strategies 
might also give good results. For instance, the "extremal 
optimization" method explored by Duch and Arenas |4jj 
for optimizing modularity could be employed as a refine- 
ment method by using the output of our spectral division 
as its starting point, rather than the random configura- 
tion used as a starting point by Duch and Arenas. 



VI. DIVIDING NETWORKS INTO MORE THAN TWO 
COMMUNITIES 

So far we have discussed primarily methods for divid- 
ing networks into two communities. Many of the net- 



works we are concerned with, however, have more than 
two communities. How can we generalize our methods 
to this case? The simplest approach is repeated division 
into two. That is, we use one of the methods described 
above to divide our network in two and then divide those 
parts in two again, and so forth. This approach was de- 
scribed briefly in Ref. |3^ |. 

It is important to appreciate that upon further sub- 
dividing a community within a network into two parts, 
the additional contribution AQ to the modularity made 
by this subdivision is not given correctly if we apply the 
algorithms of Section llVl to that community alone. That 
is, we cannot simply write down the modularity matrix 
for the community in question considered as a separate 
graph in its own right and examine the leading eigen- 
vector or eigenvectors. Instead we proceed as follows. 
Let us denote the set of vertices in the community to 
be divided by G and let uq be the number of vertices 
within this community. Now let S be an tiq x c index 
matrix denoting the subdivision of the community into c 
subcommunities such that 



Sij = 



if vertex i belongs to subcommunity j, 
otherwise. 



(49) 

Then, following Eq. (|33|l . AQ is the difference between 
the modularities of the network before and after subdi- 
vision of the community thus: 
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where B( G ) is an uq x uq generalized modularity matrix 
with elements indexed by the vertex labels i,j of the 
vertices within group G and having values 



lEG 



(51) 



with defined by Eq. 1(27)1. 

Equation (|5U[) has the same form as our previous ex- 
pression, Eq. (|33|1 . for the modularity of the full net- 
work, and, following the same argument as for Eqs. Ip?5)) 
to l|38|) . we can then show that optimization of the ad- 
ditional modularity contribution from subdivision of a 
community can also be expressed as a vector partition- 
ing problem, just as before. We can approximate this 
vector partitioning problem using only the leading eigen- 
vector as in Section llV. Al or using more than one vector 
as in Section liV.BI The resulting divisions can also be 
optimized using a "refinement" stage as in Section IV. Al 
to find the best possible modularity at each step. 

Using this method we can repeatedly subdivide com- 
munities to partition networks into smaller and smaller 
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(a) 



(b) 



FIG. 5 Division by the method of optimal modularity of a 
simple network consisting of eight vertices in a line, (a) The 
optimal division into just two parts separates the network 
symmetrically into two groups of four vertices each, (b) The 
optimal division into any number of parts divides the network 
into three groups as shown here. 



groups of vertices and in principle this process could con- 
tinue until the network is reduced to n communities con- 
taining only a single vertex each. Normally, however, we 
stop before this point is reached because there is no point 
in subdividing a community any further if no subdivision 
exists that will increase the modularity of the network as 
a whole. The appropriate strategy is to calculate explic- 
itly the modularity contribution AQ at each step in the 
subdivision of a network, and to decline to subdivide any 
community for which the value of AQ is not positive. 
Communities with the property of having no subdivision 
that gives a positive contribution to the modularity of 
the network as a whole we call indivisible; the strategy 
described here is equivalent to subdividing communities 
repeatedly until every remaining community is indivisi- 
ble. 

This strategy appears to work very well in practice. 
It is, however, not perfect (a conclusion we could draw 
under any circumstances from the fact that it runs in 
polynomial time — see above). In particular, it is certain 
that repeated subdivision of a network into two parts 
will in some cases fail to find the optimal modularity 
configuration. Consider, for example, the (rather trivial) 
network shown in Fig. which consists of eight vertices 
connected together in a line. By exhaustive enumeration 
we can show that, among possible divisions of this net- 
work into only two parts, the division indicated in Fig. [5^, 
right down the middle of the line, is the one that gives 
the highest modularity. The optimum modularity over 
divisions into any number of parts, however, is achieved 
for the three-way division shown in Fig. [SJd. It is clear 
that if we first split the network as shown in Fig. no 
subsequent subdivision of the network can ever find the 
configuration of Fig. 03, and hence our algorithm will fail 
in this case to find the global optimum. Nonetheless, the 
algorithm does appear to find divisions that are close to 
optimal in most cases we have investigated. 

Repeated subdivision is the approach we have taken to 
multi-community divisions in our own work, but it is not 
the only possible approach. In some respects a more sat- 
isfying approach would be to work directly from the ex- 



pression (|37|l for the modularity of the complete network 
with a multi-community division. Unfortunately, max- 
imizing l|37l) requires us to perform a vector partition- 
ing into more than two groups, a problem about whose 
solution rather little is known. Some general observa- 
tions are, however, worth making. First, we note that 
the community vectors R& in the optimal solution of a 
vector partitioning problem always have directions more 
than 90° apart. To demonstrate this, we note that the 
change in the contribution to Eq. l|37fl if we amalgamate 
two communities into one is 

|Ri + R 2 | 2 -(|Ri| 2 + |R 2 | 2 ) =2R!-R 2 , (52) 

which is positive if the directions of Ri and R 2 are less 
than 90° apart. Thus we can always increase the mod- 
ularity by amalgamating a pair of communities unless 
their vectors are more than 90° apart. 

But the maximum number of directions more than 90° 
apart that can exist in a p-dimensional space is p + 1, 
which means that p + 1 is also the maximum number of 
communities we can find by optimizing a p-dimensional 
spectral approximation to the modularity. Thus if we use 
only a single eigenvector we will find at most two groups; 
if we use two we will find at most three groups, and so 
forth. So the choice of how many eigenvectors p to work 
with is determined to some extent by the network: if 
the overall optimum modularity is for a division into c 
groups, we will certainly fail to find that optimum if we 
use less than c — 1 eigenvectors. 

Second, we note that while true multi-way vector par- 
titioning may present problems, simple heuristics that 
group the vertex vectors together can still produce good 
results. For instance, White and Smyth [55j have ap- 
plied the standard technique of fc-means clustering based 
on group centroids to a different but related optimization 
problem and have found good results. It is possible this 
approach would work for our problem also if applied to 
the centroids of the end-points of the vertex vectors. It is 
also possible that an intrinsically vector-based variant of 
&;-means clustering could be created to tackle the vector 
partitioning problem directly, although we are not aware 
of such an algorithm in the current vector partitioning 
literature. 



VII. NEGATIVE EIGENVALUES AND BIPARTITE 
STRUCTURE 

It is clear from the developments of the previous sec- 
tions that there is useful information about the structure 
of a network stored in the eigenvectors corresponding to 
the most positive eigenvalues of the modularity matrix. 
It is natural to ask whether there is also useful infor- 
mation in the eigenvectors corresponding to the negative 
eigenvalues and indeed it turns out that there is: the neg- 
ative eigenvalues and their eigenvectors contain informa- 
tion about a nontrivial type of "anti-community struc- 
ture" that is of substantial interest in some instances. 
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Consider again the case in which we divide our network 
into just two groups and look once more at Eq. 
which gives the modularity in this case. Suppose now 
that instead of maximizing the terms involving the most 
positive eigenvalues, we maximize the terms involving the 
most negative ones. As we can easily see from the equa- 
tion, this is equivalent to minimizing rather than maxi- 
mizing the modularity. 

What effect will this have on the divisions of the net- 
work that we find? Large negative values of the modular- 
ity correspond to divisions in which the number of edges 
within groups is smaller than expected on the basis of 
chance, and the number of edges between groups corre- 
spondingly bigger. Figure |H1 shows a sketch of a network 
having this property. Such networks are said to be bipar- 
tite if there are no edges at all within groups, or approx- 
imately bipartite if there are a few within-group edges as 
in the figure. Bipartite or approximately bipartite graphs 
have attracted some attention in the recent literature. 
For instance, Kleinberg [j^ nas suggested that small bi- 
partite subgraphs in the web graph may be a signature of 
so-called hub/authority structure within web communi- 
ties, while Holme et al. [73| and Estrada and Rodriguez- 
Velazquez [74| have independently devised measures of 
bipartitivity and used them to analyze a variety of real- 
world networks. 

The arguments above suggest that we should be able 
to detect bipartite or approximately bipartite structure 
in networks by looking for divisions of the vertices that 
minimize modularity. In the simplest approximation, we 
can do this by focusing once more on just a single term in 
Eq. (|29[) , that corresponding to the most negative eigen- 
value [3 n , and maximizing the coefficient of this eigen- 
value by choosing s$ = — 1 for vertices having a negative 
element in the corresponding eigenvector and Sj = +1 
for the others. In other words, we can achieve an ap- 
proximation to the minimum modularity division of the 
network by dividing vertices according to the signs of the 
elements in the eigenvector u n , and this division should 
correspond roughly to the most nearly bipartite division. 
We can also append a "refinement" step to the calcula- 
tion, similar to that described in Section IV. Al in which, 
starting from the division given by the eigenvector, we 
move single vertices between groups in an effort to min- 
imize the modularity further. 




FIG. 7 (a) The network of commonly occurring English ad- 
jectives (circles) and nouns (squares) described in the text, 
(b) The same network redrawn with the nodes grouped so as 
to minimize the modularity of the grouping. The network is 
now revealed to be approximately bipartite, with one group 
consisting almost entirely of adjectives and the other of nouns. 



As an example of this type of calculation, consider 
Fig. [3 which shows a network representing juxtaposi- 
tions of words in a corpus of English text, in this case 
the novel David Copper field by Charles Dickens. To con- 
struct this network, we have taken the 60 most commonly 
occurring nouns in the novel and the 60 most commonly 
occurring adjectives. (The limit on the number of words 
is imposed solely to permit a clear visualization; there 
is no reason in principle why the analysis could not be 
extended to a much larger network.) The vertices in the 
network represent words and an edge connects any two 
words that appear adjacent to one another at any point 
in the book. Eight of the words never appear adjacent 
to any of the others and are excluded from the network, 
leaving a total of 112 vertices. 

Typically adjectives occur next to nouns in English. 
It is possible for adjectives to occur next to other adjec- 
tives ( "the big green bus" ) or for nouns to occur next to 
other nouns ("the big tour bus"), but these juxtaposi- 
tions are less common. Thus we would expect our net- 
work to be approximately bipartite in the sense described 
above: edges should run primarily between vertices rep- 
resenting different types of words, with fewer edges be- 
tween vertices of the same type. One would be hard 
pressed to tell this from Fig. however: the standard 
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layout algorithm used to draw the network completely 
fails to reveal the structure present. Figure [7jp shows 
what happens when we divide the vertices by minimiz- 
ing the modularity using the method described above — a 
first division according to the elements of the eigenvector 
with the most negative eigenvalue, followed by a refine- 
ment stage to reduce the modularity still further. It is 
now clear that the network is in fact nearly bipartite, 
and the two groups found by the algorithm correspond 
closely to the known groups of adjectives and nouns, as 
indicated by the shapes of the vertices. 83% of the words 
are classified correctly by this simple calculation. 

Divisions with large negative modularity are — like 
those with large positive modularity — not limited to hav- 
ing only two groups. If we are interested purely in min- 
imizing the modularity we can in principle use as many 
groups as we like to achieve that goal. A division with 
fc groups is called fc-partite if edges run only between 
groups and approximately fc-partite if there are a few 
within-group edges. One might imagine that one could 
find fc-partite structure in a network just by looking 
for divisions that minimize the number of within-group 
edges, but brief reflection persuades us that the optimum 
solution to this search problem is always to put each ver- 
tex in a group on its own, which automatically means 
that all edges lie between groups and none within groups. 
As with the ordinary community structure problem, the 
way to avoid this trivial solution is to concentrate not 
on the total number of edges within groups but on the 
difference between this number and the expected number 
of such edges. Thus, once again, we are led naturally to 
the consideration of modularity as a measure of the best 
way to divide a network. 

One way to minimize modularity over divisions into 
an arbitrary number of groups is to proceed by analogy 
with our earlier calculations of community structure and 
repeatedly divide the network in two using the single- 
eigenvector method above. Just as before, Eq. (|50|l gives 
the additional change AQ in the modularity upon subdi- 
vision of a group in a network, and the division process 
ends when the algorithm fails to find any subdivision 
with AQ < 0. Alternatively, one can derive the ana- 
log of Eq. H37f) and thereby map the minimization of the 
modularity onto a vector partitioning problem. The ap- 
propriate definition of the vertex vectors turns out to be 



(53) 



where a is a constant chosen sufficiently large as to make 
a — J3j > for all terms in the sum that we keep. Then 
the modularity is given by 



Q = na - |R fe 

k=l 



(54) 



VIII. OTHER USES OF THE MODULARITY MATRIX 

One of the striking properties of the Laplacian matrix 
is that, as described in Section \n\ it arises repeatedly in 
various different areas of graph theory. It is natural to 
ask whether the modularity matrix also crops up in other 
areas. In this section we describe briefly two other situa- 
tions in which the modularity matrix appears, although 
neither has been viewed in terms of this matrix in the 
past, as far as we are aware. 



A. Network correlations 

For our first example, suppose we have a quantity Xi 
defined on the vertices i = 1 ... n of a network, such as 
degrees of vertices, ages of people in a social network, 
numbers of hits on web pages, and so forth. And let x 
be the n-component vector whose elements are the Xi. 
Then consider the quantity 
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where here we will take the same definition Q23J1 for our 
null model that we have been using throughout. Observ- 
ing that Yljj Aij — ki — 2m, we can rewrite r as 
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(56) 



Note that the ratios appearing in the second line are sim- 
ply averages over all edges in the network, and hence r 
has the form ( i)(xj) of a correlation function 

measuring the correlation of the values xi over all pairs 
of vertices joined by an edge in the network. 

Correlation functions of exactly this type have been 
considered previously as measures of so-called "assorta- 
tive mixing," the tendency for adjacent vertices in net- 
works to have similar properties [53U67|. For example, if 
the quantity xi is just the degree ki of a vertex, then r is 
the covariance of the degrees of adjacent vertices, which 
takes positive values if vertices tend to have similar de- 
grees to their neighbors, high-degree vertices linking to 
other high-degree vertices and low to low, and negative 
values if high-degree links to low. 

Equation l|55f) is not just a curiosity, but provides some 
insight concerning assortativity. If we expand x in terms 
of the eigenvectors of the modularity matrix, as we 
did for the modularity itself in Eq. I|29|l . we get 



(57) 



with the community vectors Rj. defined according to 
Eq. EBJ. 



where (3i is again the ith largest eigenvalue of B and 
Ci = uf x. Thus r will have a large positive value if x has 
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a large component in the direction of one or more of the 
most positive eigenvectors of the modularity matrix, and 
similarly for large negative values. Now we recall that the 
leading eigenvectors of the modularity matrix also define 
the communities in the network and we see that there 
is a close relation between assortativity and community 
structure: networks will be assortative according to some 
property x if the values of that property divide along the 
same lines as the communities in the network. Thus, for 
instance, a network will be assortative by degree if the 
degrees of the vertices are partitioned such that the high- 
degree vertices fall in one community and the low-degree 
vertices in another. 

This lends additional force to the discussion given in 
the introduction, where we mentioned that different com- 
munities in networks are often found to have different av- 
erage properties such as degree. In fact, as we now see, 
this is probably the case for any property that displays 
significant assortative mixing, which includes an enor- 
mous variety of quantities measured in networks of all 
types. Thus, it is not merely an observation that differ- 
ent communities have different average properties — it is 
an expected behavior in a network that has both com- 
munity structure and assortativity. 



B. Community centrality 

For our second example of other uses of the modu- 
larity matrix, we consider centrality measures, one of 
the abiding interests of the network analysis community 
for many decades. In Section HV. Al we argued that the 
magnitudes of the elements of the leading eigenvector of 
the modularity matrix give a measure of the "strength" 
with which vertices belong to their assigned communi- 
ties. Thus these magnitudes define a kind of centrality 
index that quantifies how central vertices are in commu- 
nities. Focusing on just a single eigenvector of the mod- 
ularity matrix, however, is limiting. As we have seen, all 
the eigenvectors contain useful information about com- 
munity structure. It is useful to ask what the appropriate 
measure is of strength of community membership when 
the information in all eigenvectors is taken into account. 
Given Eq. (|37Jl . the obvious candidate seems to be the 
projection of the vertex vector onto the community 
vector Rfc of the community to which vertex i belongs. 
Unfortunately, this projection depends on the arbitrary 
parameter a, which we introduced in Eq. Ij35|) to get 
around problems caused by the negative eigenvalues of 
the modularity matrix. This in turn threatens to intro- 
duce arbitrariness into our centrality measure, which we 
would prefer to avoid. So for the purposes of defining a 
centrality index we propose a slightly different formula- 
tion of the modularity, which is less appropriate for the 
optimization calculations that are the main topic of this 
paper, but more satisfactory for present purposes, as we 
will see. 

Suppose that there are p positive eigenvalues of the 



modularity matrix and q negative ones. We define two 
new sets of vertex vectors {x^} and {y^}, of dimension p 
and q, thus: 
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(Note that p + q < n since there is always at least one 
eigenvalue with value zero.) In terms of these vectors the 
modularity, Eq. (J2HJ), can be written as 
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where Gk is once again the set of vertices in community k 
and the community vectors and Y& are defined by 
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This reformulation avoids the use of the arbitrary con- 
stant a, thereby making the vertex vectors dependent 
only on the network structure and not on the way in 
which we choose to represent it. 

Equation (|60|l separates out the positive and negative 
contributions to the modularity, the positive contribu- 
tions coming from vertices that have large corresponding 
elements in the eigenvectors with positive eigenvalues, 
and conversely for the negative contributions. The two 
contributions correspond respectively to the traditional 
community structure of Sections II I II and IIVI and to the 
bipartite or fc-partite structure discussed in Section IVIII 
It is important to notice that while obviously the over- 
all modularity can only be either positive or negative, it 
is entirely possible for individual vertices to simultane- 
ously make both large positive and large negative con- 
tributions to that modularity. Upon reflection, this is 
clearly reasonable: there is no reason why a single ver- 
tex cannot have more connections than expected within 
its own community and more connections than expected 
to other communities. In a sense, Eq. (|60|l may be a 
more fundamental representation of the modularity than 
Eq. 1)37(1 because it makes this separation transparent, 
even if it is in practice less suitable as a basis for modu- 
larity optimization. 

We can now define precisely the quantity that plays 
the role previously played by the elements of the lead- 
ing eigenvector in the single-eigenvector approximation: 
it is the projection of Xj onto the relevant community 
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vector Xfe, as we can see by writing the magnitude |Xfe| 
in Eq. (JSDJl as 
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where X& is the unit vector in the direction of X& . Thus 
each vertex vector makes a contribution to |Xfc| equal 
to its projection onto X&. In the approximation where 
we ignore all but the leading eigenvector, this projection 
reduces to the (magnitude of) the appropriate element of 
that eigenvector, as in Section llV.AI 

The projection specifies how central vertex i is in its 
own community in the traditional sense of having many 
connections within that community. If this quantity is 
large then we will lose a large positive contribution to the 
modularity if we move the vertex to another community, 
which is to say that the vertex is a strong member of its 
current community. 

But there is also a second measure for each vertex, the 
projection of yi onto Y^. This projection corresponds to 
a more unusual sort of centrality which is high if vertex i 
has many connections to others outside its community. 
This "outsider" centrality measure could also be useful in 
certain circumstances to identify individuals with strong 
external connections. 

These two projections, however, do not take precisely 
the form that we expect of a centrality measure because 
they are functions not only of the vertex itself (via Xj 
or yi) but also of the community in which it is placed 
(via Xfe or Y^). Instead, therefore, let us consider the 
projection in the form |xj| cos9ik, where Oik is the angle 
between Xi and X^. The two parts of this expression are 
both of interest. The first, the magnitude |xj|, measures 
how large a positive contribution vertex i can potentially 
make to the modularity. The vertex only actually makes 
a contribution this large if the vertex vector is aligned 
with the community vector, i.e., if the vertex is, in a 
sense, "in the middle" of the community to which it be- 
longs. Even a vertex for which |x^| is large may in prac- 
tice make a small positive contribution to the modularity 
if Xi is almost perpendicular to X&, i.e., if the vertex is 
"on the edge" of the community. 

The second part of the projection, the cos Oik, is a mea- 
sure precisely of the vertex's position in the middle or 
on the edge of its community. In the parlance of social 
network analysis, the vertex is either in the core of its 
community (cos Oik near 1) or in the periphery (cos Oik 
nearer 0). The cosine is a property both of the vertex 
and of the community. 

Let us focus here on the vector magnitudes and define 
two centrality measures for vertices in a network equal 
to the magnitudes of the vertex vectors x^ and y,; . (If we 
prefer, we could use |xj| 2 instead, which is slightly easier 
to calculate. If, as is sometimes the case with centrality 
measures, we only care about relative rankings of ver- 
tices, then the two are equivalent.) These centralities are 
now properties of the vertices alone and are independent 



of the way the network is divided into communities. We 
notice, however, that |x^| and |yj| are not independent 
since 
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Almost all networks considered in the literature are sim- 
ple graphs, meaning, among other things, that they have 
no self-edges (edges that connect vertices to themselves) 
and hence An = for all i. If the expected number of 
self-edges Pa is also zero (as seems sensible), then Bu = 
and we have |x^| = \yi\ for all i. Thus there is actually 
only one centrality for simple graphs, not two. 

In fact, the choice i|23|) for that we and other au- 
thors have mostly used does allow self-edges (and is in 
this sense slightly unrealistic — see H^), but Pa = kf/2m 
is typically small for most vertices if m is large (and 
indeed vanishes as m — > oo if degrees are bounded), 
and hence it is still true to a good approximation that 
|xi| ~ |yi| and there is only one centrality. 

In other words, we come to the nontrivial conclusion 
that the vertices with the greatest capacity for making 
positive contributions to the modularity also have the 
greatest capacity for making negative contributions. The 
fundamental meaning of this centrality measure is thus 
that there are certain vertices that, as a consequence of 
their situation within the network, have the power to 
make substantial contributions, either positive or nega- 
tive, to the overall modularity of the network. For this 
reason, we call this centrality measure community cen- 
trality. We define it to be equal to the vector magni- 
tude |Xj|. 

An alternative way to view the community centrality is 
to consider how a vertex i is situated among the other ver- 
tices in its immediate vicinity — its neighborhood in the 
network. If we were to artificially construct a community 
from the vertices of this neighborhood, then that commu- 
nity would presumably have a community vector X& with 
direction close to x^, and hence the magnitude |xi| would 
be a good measure of the actual strength with which ver- 
tex i belongs the community. Thus vertices with high 
community centrality are ones that play a central role in 
their local neighborhood, regardless of where the official 
community boundaries may lie. Conversely, even when 
considered as the "center of its world" in this way, ver- 
tex i can never play a central role in its neighborhood in 
this sense if |x^| is small. 

As an example, consider Fig. |S| which shows results 
for community centrality for a network of coauthorships 
between scientists, scientists in this case who are them- 
selves publishing on the topic of networks. The net- 
work is similar to the one presented in Ref. 18] but is 
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FIG. 8 A network of coauthorships between 379 scientists whose research centers on the properties of networks of one kind or 
another. Vertex diameters indicate the community centrality and the ten vertices with highest centralities are highlighted. For 
those readers curious about the identities of the vertices, an annotated version of this figure, names and all, can be found at 
http://www.umich.edu/~mejn/centrality. Inset: a scatter plot of community centrality against vertex degrees. Like most 
centrality measures, this one is correlated with degree, though only moderately strongly. 



based on more recent data, including publications up un- 
til early 2006. 7 The network has a total of 1589 scien- 
tists in it, from a broad variety of fields, but only the 379 
falling in the largest connected component are shown in 
the figure. The diameters of the vertices in the figure 
are proportional to their community centrality (actually 
to |xi| 2 — see above), and the ten vertices having the high- 
est centralities are highlighted. A couple of remarks are 
worth making about the results. Without naming spe- 
cific names, we observe that all of the highlighted authors 
are group leaders or senior researchers of groups working 
in this area. Thus community centrality appears to live 



7 The vertices of the network represent all individuals who are 
authors of papers cited in the bibliographies of either of two 
recent reviews on networks research Q, y| and edges join every 
pair of individuals whose names appear together as authors of 
a paper or papers in those bibliographies. A small number of 
additional references were added by hand to bring the network 
up to date. 



up to its name in this admittedly anecdotal example: it 
highlights those vertices that are central in their local 
communities. Second, while the centrality is correlated 
with degree (r 2 = 0.59 — see the inset figure), the two are 
not perfectly correlated and in particular some vertices 
have quite high centrality while having relatively low de- 
gree. This emphasizes the point that high centrality is 
an indicator of individuals who have more connections 
than expected within their neighborhood (and hence po- 
tentially make a large contribution to the modularity), 
rather than simply having a lot of connections. 



IX. CONCLUSIONS 

In this paper, we have studied the problem of detect- 
ing community structure in networks. There is already 
a substantial body of theory supporting the view that 
community structure can be accurately quantified using 
the benefit function known as modularity and hence that 
communities can be detected by searching possible divi- 
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sions of a network for ones that possess high modular- 
ity. Here we have demonstrated that the modularity can 
be succinctly expressed in terms of the eigenvalues and 
eigenvectors of a matrix we call the modularity matrix, 
which is a characteristic property of the network and is it- 
self independent of any division of the network into com- 
munities. Using this expression we have derived a series 
of further results including several new and competitive 
algorithms for identifying communities, a method for de- 
tecting bipartite or fc-partite structure in networks, and 
a new community centrality measure that identifies ver- 
tices that play a central role in the communities to which 
they belong. 

We have demonstrated a variety of applications of our 
methods to real-world networks representing social, tech- 
nological, and information networks. These, however, are 
intended only as illustrations of the potential of these 
methods. We hope that readers will feel encouraged to 
apply these or similar methods to other networks of sci- 
entific interest and we look forward to seeing the results. 
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